Evaluating Overlapping Communities 
with the Conductance of their Boundary Nodes 



Frank Havemann 



i,* 



Jochen Glaser 2 



Michael Heinz 1 



Alexander Struck 1 



(N 

o 

O 

Q 

o 

(N 



in 
o 



(N 
> 
(N 
OS 
On 

m 
SO 

o 

(N 

> 
X 



1 Institut fiir Bibliotheks- und 

Informationswissenschaft, Humboldt-Universitat zu 
Berlin, Berlin, Germany 

2 Zentrum Technik und Gesellschaft, Technische 
Universitat Berlin, Berlin, Germany 

* E-mail: Frank (dot) Havemann (at) 
ibi.hu-berlin.de 



Abstract 

Usually the boundary of a community in a network 
is drawn between nodes and thus crosses its outgo- 
ing links. If we construct overlapping communities 
by applying the link-clustering approach nodes and 
links interchange their roles. Therefore, boundaries 
must drawn through the nodes shared by two or more 
communities. For the purpose of community eval- 
uation we define a conductance of boundary nodes 
of overlapping communities analogously to the graph 
conductance of boundary-crossing links used to par- 
tition a graph into disjoint communities. We show 
that conductance of boundary nodes (or normalised 
node cut) can be deduced from ordinary graph con- 
ductance of disjoint clusters in the network's weighted 
line graph introduced by Evans and Lambiottc (2009) 
to get overlapping communities of nodes in the orig- 
inal network. We test whether our definition can be 
used to construct meaningful overlapping communi- 
ties with a local greedy algorithm of link clustering. 
In this note we present encouraging results we ob- 
tained for Zachary's karate-club network and for a 
larger network of 492 information-science papers and 
the sources cited in these papers. 



1 Introduction 

A community in a network is usually defined as a 
subgraph that is both cohesive and well separated 



*(C) = 



from the rest of the network (Fortunato 2010). Co- 
hesion and separation can be evaluated by various 
measures. A simple absolute measure of cohesion is 
the sum of weights of links between all members C 
of a subgraph which equals their total internal de- 
gree h m (C) divided by 2. A simple absolute measure 
of separation is the sum of weights of links between 
members and non-members which equals the total ex- 
ternal degree k out (C). A single function sensitive to a 
subgraph's cohesion and separation is the normalised 
cut 

fcout(g) 

fci„(C) + fc out (C7)- 

The total external degree fc ou t(C) equals the total 
electrical conductance of the links cut by C's bound- 
ary if each link's conductance is defined by its weight. 
Therefore $(C) is also named conductance. 1 If the 
cut through external links of a subgraph with node 
set C has minimal conductance $((7) then C can 
be called a community. Yang and Leskovec (2012) 
tested 13 evaluation functions and found that evalu- 
ating subgraphs with conductance $ results in good 
disjoint communities. 

These ideas can also be applied when the perspec- 
tive on the network is changed and communities of 
links instead of communities of nodes are to be con- 
structed. This approach was introduced by Evans 
and Lambiotte (2009) and by Arm, Bagrow, and 
Lchmann (2010) with the aim to obtain overlapping 
node communities. 2 The definition of a community 
as a cohesive and separated subgraph still holds for 

1 4>(C) is called conductance only if the total degree of 
k(C) = fcin(C l ) + feout(C l ) is smaller than the total degree of C's 
complement, s. the review by Fortunato (2010) and references 
therein. 

2 Link clustering is advantageous for some tasks of commu- 
nity detection, for example, if communities representing the- 
matic structures in networks of papers are to be constructed, 
the focus on links is to be preferred because citation links be- 
tween papers are thematically more homogenous than papers 
themselves (Havemann, Glaser, Heinz, and Struck 2012). 
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this perspective. The important difference between 
communities of nodes and communities of links is 
that the latter's boundaries cuts through nodes in- 
stead of links. When community boundaries shift 
from links to nodes, the evaluation function needs to 
be changed accordingly. In this paper, we introduce 
the normalised node cut '5 as an evaluation function 
for link communities. 

Evans and Lambiotte (2009) found that a net- 
work's line graph can be used to obtain link commu- 
nities. They applied modularity — a global evaluation 
function — to obtain link communities but stressed 
that any method for community construction can be 
applied to the line graph. We show that evaluating 
a network's subgraphs with the normalised node cut 
\f is equivalent to an evaluation of subgraphs in the 
network's line graph with ordinary normalised edge 
cut <E> if each edge in the line graph is weighted with 
the degree of the corresponding node as proposed by 
Evans and Lambiotte (2009). 

We test whether the normalised node cut ^ can be 
used to evaluate subgraphs and thus to find a net- 
work's link communities. For this purpose we con- 
struct a ^-landscape. A community of links is defined 
as a subgraph with a local ^-minimum. We apply a 
greedy local expansion algorithm to find local minima 
in the ^-landscape. We present results obtained for 
a simple benchmark, the karate-club network anal- 
ysed by Zachary (1977) and first results for a larger 
network of 492 information-science papers and their 
cited sources. 



2 Method 

2.1 Normalised Node Cut 

Since the boundary of a link community consists of 
nodes, the measure of cohesion and separation must 
be shifted accordingly. We define the normalised node 
cut 4 r (C) of a connected subgraph with node set C 
as the normalised total conductance of C's boundary 
nodes given by 

mc) = 1 V 1 

1 > k iD (C)^l/k?(C) + l/k°^(C) 

i ^ fc-"(c)fc° ut (g) ^ !:h 

&in(C) i( _ c ki 



Here k\ n (C) is the internal degree of node i, i.e. the 
total weight of its links to other C-members, and 
k° ut (C) its external degree, i.e. the total weight of 
its links to non-members. Both sum up to the to- 
tal degree of node i, which does not depend on C: 
h = k?{C) + k° ut (C). 

Each term in the sum of equation 1 equals the 
electrical conductance between external and internal 
nodes connected through node i if we identify the link 
weights with electrical conductances. Since the ex- 
ternal degree fc° ut (C) of inner nodes is zero, the sum 
includes only the nodes that constitute C's boundary. 

The normalised node cut '5 is defined for weighted 
networks. For the sake of simplicity, we restrict the 
discussion to unweighted networks in the remainder 
of the paper. In this case, k™(C) equals the number 
of links between node i and other members of C and 
fc° ut (C) is the number of links between i and nodes 
outside C. 

For the complete graph, which has no outgoing 
links, = because for all nodes fc° ut = 0. Fur- 
thermore, the normalisation in equation 1 guaran- 
tees that ^ < 1 for all subgraphs because \& equals 
the fcj n -weighted average of relative external degrees 
k° ut /k, < l. 3 

Function ^(C) decreases with increasing internal 
cohesion (measured by k- m (C)) and with decreasing 
linkage with the rest of the network (measured with 
the sum in equation 1). Thus, \&(C) is a function 
sensitive to a subgraph's cohesion and separation. 

Our definition of the normalised node cut \& can be 
derived by applying the normalised edge cut $ in the 
network's line graph with weights This weight- 

ing was proposed by Evans and Lambiotte (2009). 
To construct a network's line graph we first define 
an auxiliary bipartite graph obtained by putting a 
node on each link of the original network. The affil- 
iation matrix B of the bipartite graph — also called 
its incidence matrix — has a row for each of the n 
original nodes and a column for each of the m orig- 
inal links. Each link column contains only two non- 
zero elements, namely the elements in the rows of the 
nodes i and j connected by the link. We can project 
the bipartite graph back onto the original network 
with the product BB T which equals its adjacency 
matrix A (except for the main diagonal). 

3 One of the 13 evaluation functions tested by Yang and 
Lcskovec (2012) is the unweighted average of relative external 
degrees (they call it average out degree fraction). 
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We obtain the network's line graph by the oppo- 
site projection B T B of the bipartite graph. Evans 
and Lambiotte (2009) emphasise, that the line graph 
contains the same amount of information as the origi- 
nal network in all cases of practical interest. Knowing 
B T B we can almost always calculate BB T and thus 
also the network's adjacency matrix A. 

Evans and Lambiotte (2009) weight the edges of 
the line graph with the inverse degree of the 
node i in the original network because each node is 
represented as a clique in the line graph. They define 
the line graph's adjacency matrix as 



Eki = 



BikBu 



1=1 



(2) 



For this line graph we can calculate the ordinary 
graph conductance or normalised cut $ of a link set 
L and get $(L) = *(C(X)), where C(L) is the set of 
nodes attached to links in L. The proof can be found 
in the appendix (p. 9). 

Different link sets L can have the same induced 
node set C(L) if we define C(L) to be the set of all 
nodes attached to links in L. We define the link set 
L(C) induced by C as the maximum set of links (ex- 
isting in the network) that induces C. A connected 
subgraph's link set is assumed to be a maximum set. 
It is induced by the subgraph's node set. If we would 
not include all existing links between all nodes of a 
subgraph in its link set we would have external links 
between member nodes. We could even change two 
adjacent inner nodes to boundary nodes if we would 
omit the link connecting them from the subgraph's 
link set. We are looking for cohesive subgraphs and 
any subgraph with an incomplete link set is less co- 
hesive then one with a complete set L(C). 

Weighting the line graph's edges with the inverse 
degrees of nodes in the original network is equivalent 
to an Euclidean normalisation of the nodes' vectors 
in the affiliation matrix B of the auxiliary bipartite 
graph. This becomes clear if we factorise the terms 
of the sum in equation 2: 



E M = 



Bik B i: 



=i 



(3) 



Then we can shortly write E — D T D with Dik = 
Bik/Vk~i and verify the Euclidean normalisation of 
the n row vectors of D: 



fc=i 



3, 

k; 



^ m 



(4) 



k=l 



Here we used that B is binary (because A is binary 
for unweighted networks) and therefore Bf k = Bik- 

The projection of the normalised bipartite graph 
described by affiliation matrix D back on a network 
of the original nodes is given by DD T . Any element 
of adjacency matrix DD T (except for the main diag- 
onal) is given by 



fc=l 



ikDjk = 




At 



(5) 



This means that the Euclidean normalisation of B's 
row vectors is equivalent to weighting each link in 
the original (unweighted) network with the geomet- 
ric mean of its nodes' inverse degrees. The weighted 
graph described by adjacency matrix E is not the 
line graph of the unweighted network described by 
adjacency matrix A but the line graph of the net- 
work weighted according to equation 5. The approach 
is applicable only to those networks for which the 
weighting with the geometric means is a realistic as- 
sumption. 

2.2 Defining Communities 

Local greedy algorithms construct communities by 
starting from seeds and adding those neighbours of 
the subgraph which maximally improve or minimally 
downgrade its cohesion and separation. Due to its 
locality, this approach can be used to construct a 
seed's nested sequence of communities in a network 
which is too large to process it totally (Clauset 2005; 
Luo, Wang, and Promislow 2008; Havemann, Heinz, 
Struck, and Glaser 2011). In the case of link commu- 
nities, the local greedy expansion of subgraphs should 
start from links as seeds and iteratively add new links. 

We construct a landscape of the normalised node 
cut ^(C) of connected subgraphs. Each connected 
subgraph consists of a node set C and the link set 
L(C) containing all links that exist between nodes 
in C. Each place in the landscape represents a con- 
nected subgraph defined by its node set C. A relation 
between two places exists if one of the subgraphs can 
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be obtained by adding a node to the other one. The 
height of a place is the subgraph's Vf-value. 4 Com- 
munities can be defined as those subgraphs whose 
^/-values are local minima in the ^-landscape. 5 

Any two places in the ^-landscape are connected 
by paths. To reach one place from another one we 
have to add and remove nodes of the corresponding 
subgraphs. The absolute distance between two places 
in the ^/-landscape is the number of steps one has 
to go on a shortest path between them. These are 
|MUiV| - 1 A/n N\ steps, where M and N are the node 
sets of the two connected subgraphs. We obtain the 
Jaccard distance by normalising the absolute distance 
with \M U N\. 

The distance between two communities can be used 
to define a community's stability. A community's 
stability is the shortest Jaccard distance to a com- 
munity with a lower "JJ-value. A community is more 
cohesive and better separated from the rest of the 
network than all other communities within the ra- 
dius of the shortest Jaccard distance to a community 
with a lower 'J'-value. 

The communities evaluated by ^ are connected 
subgraphs that grow by adding neighbouring nodes 
with all their links to the community. However, it 
is justified to treat these communities as link com- 
munities because the community's boundary consists 
of nodes. Expanding a link community means shift- 
ing its boundary from one set of nodes to another, 
while the node communities commonly discussed in 
the literature are expanded by shifting their bound- 
aries from one set of links to another. 

2.3 The ^-Landscape 

of the Bow-Tie Graph 

We demonstrate our approach with the bow-tie graph 
in Fig. 1 that has also been used by Evans and Lam- 
biotte (2009). When we evaluate all its connected 
subgraphs with the normalised edge cut $ we find 
that there is no local ^-minimum, which means that 
no node communities can be identified using this 
method. However, we expect to identify two link 

4 If we imagine the relations to be located on a two- 
dimensional surface we cannot avoid that they cross each other. 
A better imagination is therefore that relations are like cable- 
ways between places of different height. 

5 This landscape concept is not restricted to link commu- 
nities but can be used with all local evaluation functions for 
communities. 




Figure 1: Bow-tie graph 




Figure 2: ^-landscape of the bow-tie graph. Ellipses 
correspond to subgraphs with ^-minima and rhombi 
correspond to seed links; the blue numbers are up- 
values, the red ones id-numbers of nodes added on 
the way between two places. 

communities consisting of the three links on the left- 
hand side and the right-hand side, respectively. Fig. 2 
shows the graph's ^-landscape. In addition to the 
global minimum that corresponds to the whole graph 
there are indeed two local minima corresponding to 
the two expected link clusters. = 1/6 is below the 
^-values of the surrounding places in the landscape. 
From any of the six links of the graph, a local mini- 
mum is reached after one step in the landscape. 

The same two link communities are found when the 
normalised edge cut $ is applied as evaluation func- 
tion to the bow-tie graph's line graph with weights 
1/ki. 6 Applying the normalised node cut "f, the link 
communities can be identified directly without the 
construction of a line graph. 



"This can easily be derived. The bow-tie's line graph with 
weights l/k{ is shown in figure 3 in the paper by Evans and 
Lambiotte (2009). 
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Figure 3: Height profile of a path through the bow- 
tie graph's ^/-landscape starting from seed link C — 
{0, 1} (the blue numbers denote the nodes added on 
the path, the height is the subgraphs' C normalised 
node cut or node-boundary conductance &{C)) 

2.4 Identifying Communities 
with a Greedy Algorithm 

Constructing the whole ^-landscape is not feasible 
for larger networks. However, algorithms for the 
search of stable local minima in the ^-landscape can 
be constructed. We tested a greedy algorithm that 
adds those nodes that incur either the greatest re- 
duction in \1/ (i.e. go down the steepest slope in the 
^-landscape) or the smallest increase in ^ (i.e. go up 
the gentlest slope in the ^-landscape). 

Starting from a seed link, we go downhill in the 
^-landscape on the path with the steepest slope. 
This slope is produced by adding to the subgraph the 
neighbouring node that incurs the greatest reduction 
in ty. If the ^-balance of two candidate nodes is tied 
we randomly select a path. Experiments showed that 
in most cases the two paths created by nodes with 
tied ^-balance will merge soon. If adding any neigh- 
bouring node to a subgraph increases ^, we search 
for members of the subgraph whose exclusion further 
reduces "J. If we don't find such members then we 
have reached a local ^-minimum. If we do, we prune 
the subgraph by excluding them and try again to add 



a neighbour which maximally reduces ^. If this trial 
fails then we have also reached a minimum. 

After a local minimum has been reached, we con- 
tinue to add nodes. Initially, nodes producing the 
smallest increase in W must be added for the algo- 
rithm to leave the hollow in the ^-landscape created 
by the local minimum. Thereafter, the search for the 
steepest slope can be resumed. This is repeated until 
we reach the ground state of the whole (connected) 
network with '5 = 0. If all links of a network are 
used as seeds, there is a high likelihood that all lo- 
cal minima are found or at least all those with high 
stability. 

Fig. 3 shows the profile of one path through the 
^-landscape for the bow-tie graph. The algorithm 
starts from seed link C — {0, 1}. The steepest path 
leads to C = {0,1,2}, which corresponds to a lo- 
cal minimum of ^. The local minimum can be left 
by adding cither node 3 or node 4. Both subgraphs 
have the same height in the VP-landscape, namely 
W = 5/24. Starting from one of them, we reach the 
absolute minimum *({0, 1, 2, 3, 4}) = 0. In Fig. 3 
and all subsequent \P-plots we let the curve start with 
the first node of the seed link because in some cases 
the seed link's ^/-value already is a local minimum. 
Since one node has no internal links, its normalised 
node cut cannot be calculated. This is why we define 
W = 1 for a single node connected with the rest of 
the graph. 

For larger networks it takes much time to deter- 
mine the path for each link. It is possible to save com- 
puting time by an updating of node sets and variables 
needed for the iterative procedure. In appendix A. 2 
we describe how the sum in the ^-function can be up- 
dated during the iteration. The C ++ -implcmcntation 
of an optimised version of the algorithm written by 
Andreas Prescher has a creative-commons license. 

3 Experiments 
3.1 Karate Club 

The club of 34 karate fighters observed by Zachary 
(1977) split up into two disjoint parts of equal size. 
The links between the fighters were weighted with 
their interactions at different places. The network 
of the karate club became a benchmark graph that is 
often used for testing cluster algorithms, including al- 
gorithms for the construction of overlapping commu- 
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Figure 4: Community Cj has 29 nodes (blue) and 
68 links (red). The ten grey links of community C4 
connect six nodes including the boundary node 1. 

nities. In our experiments, we used the unweighted 
version of the network. 

We applied the greedy algorithm described above 
to each of the 78 links of the network. The algo- 
rithm found seven local minima in the network's \f r - 
landscape. Table 1 lists for each minimum the num- 
ber of links, the number of nodes, the normalised 
node cut 'J, and the number of seed links from which 
the minimum has been found. 

The largest community C\ with 29 nodes has only 
one boundary node, namely node 1 (s. Fig. 4). The 
same one-node boundary delineates community C4 
with six nodes. The union of C\ and C4 covers 
the whole network, their intersection contains only 
node 1. Analogously, the union of C2 and C3 also 
covers the network. Their intersection contains their 
common boundary nodes, namely nodes 3, 9, 14, 20, 
31, and 32 (s. Fig. 5). The other three communities 
are subsets of larger communities, C5 and Cq of C2, 
and C7 of C\ and C3. Community C5 has five nodes 



Table 1: Communities found in the karate network 



name 


links 


nodes 




seeds 


Ci 


68 


29 


.022 


68 


c 2 


43 


21 


.077 


40 


C 3 


41 


19 


.091 


10 


Ck 


10 


6 


.150 


10 


c 5 


6 


5 


.294 


7 


c, 


2 


3 


.460 


2 


c 7 


1 


2 


.469 


1 



Figure 5: Community C2 has 21 nodes (blue) and 43 
links (red). Community C3 has 19 nodes (connected 
by grey links) . Both communities share six boundary 
nodes (connecting grey and red links). 




Figure 6: Community C5 has five nodes (blue) and 
six links (red). 

as members (s. Fig. 6), C§ only three (nodes 3, 10, 
and 34). Community C7 is the pair of nodes 1 and 
12. 

At least one local minimum of "J was found in each 
run. 27 of the 78 seed links led to one local minimum, 
42 to two local minima, and nine to three local min- 
ima. All ten links of community C4 led to C3 and C4. 
As an example of the paths through the ^-landscape, 
Fig. 7 shows the plot of \& over the number of nodes 
obtained by starting from seed link (1,5). Since the 
^-scale in Fig. 7 is logarithmic, the last point of the 
\f-curve, which represents the whole network with 
^ = 0, is not visible. 

40 of 43 links of C2 found C\ and Ci. As an ex- 
ample, we show the "JJ-plot of seed link (33, 34) in 
Fig. 8. The two links of Cq C C2 found also Cg, the 
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6 

nodes 



30 3132 910282934,, 

1 Q : 33 25< 

26 

nodes : 24v 




number of nodes 



10 15 20 25 30 35 

number of nodes 



Figure 7: 'J'-plot of seed link (1, 5) with minima of Figure 9: \P-plot of seed link (25, 26) with minima of 
communities C4 and C3 communities C5, C2, and C\ 



nodes , 



12 511 

29 



10 15 20 25 30 35 

number of nodes 




10 15 20 25 30 35 

number of nodes 



Figure 8: \p-plot of seed link (33, 34) with minima of Figure 10: ^P-plot of seed link (1, 2) with minimum 
communities C2 and C\ of community C\ 
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six links of C5 C C2 also C5 . In Fig. 9 we plot ^ of 
seed (25, 26). 

Community Cg is also present as a local minimum 
when we start from seed (3, 28). However, in this case 
node 3 is excluded in the pruning process because 
excluding it further reduces "J. Community C7 is 
identical to its seed link (1, 12). All other 27 links 
led the algorithm to only one local minimum, namely 
that of C\ . All these links belong to C3 . As a further 
example, we plot the \I/-curve of seed (1, 2) in Fig. 10. 

Two pairs of communities found by the greedy al- 
gorithm each cover the whole network and overlap in 
their boundary nodes, namely the pair C2 and C3 and 
the pair C\ and C4. The cut by the boundary nodes 
between C2 and C3 is compatible with the final split 
of the karate club and corresponds to one solutions 
found by Evans and Lambiottc (2009). Four of the six 
fighters on the boundary between C2 and C3 (nodes 
3, 9, 14, 20) decided to follow Mr. Hi (node 1) and 
the other two (nodes 31 and 32) joined the officer's 
club (node 34). 

Note that not only C2 and C3 overlap but also 
their link sets L(C 2 ) and L(C 3 ): L(C 2 ) <~) L(C 3 ) = 
{(3, 9), (9, 31)}. This is a special feature of our greedy 
algorithm based on the normalised node cut. The link 
clustering procedures proposed by Ahn et al. (2010) 
and by Evans and Lambiotte (2009) produce only 
disjoint link clusters. 

In our example, two nested tree-like hierarchies of 
communities can be observed: 7 

1. the whole graph splits into C\ and C4 which 
overlap in boundary node 1, and C\ has several 
communities as subgraphs: 

(a) C 2 with C 5 and C 6 as subgraphs, 

(b) C r , 

2. the whole graph splits into C2 and C3 which 
overlap in six boundary nodes, and each has two 
communities as subgraphs: 

(a) C2 contains C5 and Cq, 

(b) C3 contains C4 and C7. 

C3 is missing in the first hierarchy and C\ in the sec- 
ond. These two communities cannot appear in the 
same nested tree-like hierarchy because they have a 

hierarchical link clustering also produces tree structures 
but only some of the many branches of the dendrogram are 
communities (Ahn et al. 2010; Havcmann et al. 2012). 




Figure 11: Poly hierarchy of all link communities of 
the karate-club network (symbolised by Co). Lower- 
level communities are subgraphs of higher-level ones. 
When C\ and the red or C3 and the green link are 
omitted we obtain one of the two alternative tree-like 
hierarchies (s. text). 

permeating overlap, i.e. they share not only bound- 
ary nodes but also inner nodes. The occurrence of 
such permeating overlaps is also a new feature of our 
approach which is absent in the approaches proposed 
by Ahn et al. (2010) and by Evans and Lambiottc 
(2009). If permeating overlap is a realistic assump- 
tion we obtain one solution for the karate-graph but 
this solution is a polyhierarchy rather than a tree- 
like hierarchy because C7 is a subgraph of both C\ 
and C3 (s. Fig. 11). In most cases, combined sub- 
communities do not completely cover the higher-level 
communities they are part of. The only exception is 
the whole graph Co, which is completely covered by 
the pair (Ci, C4) and also by the pair (C2, C3). 

3.2 Network of Papers and Sources 

We have also tested the algorithm on a larger network 
consisting of 492 papers from the 2008 volumes of 
six information-science journals and the about 14.000 
sources cited by these papers. We made the net- 
work bipartite by neglecting that some papers are 
also cited sources. We analysed the undirected ver- 
sion of the citation network. For more details of the 
data see Havemann et al. (2011). 

The complete findings will be published elsewhere. 
Here we only sketch some of the results obtained 
with 46 seed links which represent the 46 citations 
of Hirsch's 2005 paper where he invented the well- 
known /i-index. Starting from these links the paths 
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through the network's ^-landscape lead us to many 
communities of different sizes. Most of them are very 
unstable in the sense defined above: for an unstable 
community we nearby find a community with a lower 
"f-value. Among the most stable of these commu- 
nities there is one with 42 papers and 409 sources. 
We compared these 42 papers with the 42 papers 
in the sample identified by us as dealing with the 
h- index (Havemann et al. 2012). There are three 
false positives and three false negatives which gives a 
value of 93 % for recall and for precision. One of the 
false positives is only a partial member with a mem- 
bership grade of 3 % (one of its 36 references points 
to a source in the community). All other 41 papers 
are full members. If we take the partial membership 
into account the result's precision improves to 95 %. 
The /i-community we obtained with hierarchical link 
clustering (HLC) on the same dataset is with 91 % 
matching slightly less precise (Havemann et al. 2012, 
Table 1). 

4 Summary and Conclusions 

We propose to see a community being connected via 
nodes and not via links to the rest of the network. 
These boundary nodes can be shared by two or more 
communities. To evaluate overlapping communities 
we define a normalised node cut analogously to 
usual normalised cut $ used to construct disjoint 
communities. We define a community C as a con- 
nected subgraph corresponding to a local minimum 
in a ^-landscape over all connected subgraphs which 
are linked by inclusion of a subgraph's neighbour or 
exclusion of a member. Applying a greedy algorithm, 
we found seven local ^-minima of the karate-club 
network from which nested hierarchies of overlapping 
communities can be constructed. One pair of commu- 
nities overlapping in their boundary nodes is compat- 
ible with the final split of the karate club. Further 
tests on benchmark graphs are in preparation. 

Normalised node cut \& only uses degrees of nodes 
as input and can therefore be calculated also for 
weighted graphs. A greedy algorithm may not find 
all local minima in the ^-landscape of connected sub- 
graphs. Whether local ^-minima represent useful 
communities depends on the part of reality we model 
with our network. 
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A Appendix 
A.l Proof of tf(C) = 

Normalised node cut W(C) of a subgraph in a network 
equals normalised edge cut Q(L) of the correspond- 
ing subgraph in the network's line graph weighted 
according to Evans and Lambiotte (2009), as we will 
show now. 

Like in equation 2 we here use i,j = 1, ... ,n to 
denote nodes and k,l = l,...,m for links. With 
L(C) we name the set of links between the nodes in 
C . If a link k belongs to L its membership fJ>k(L) = 1 
and zero otherwise. 

We calculate the normalised edge cut $ of a link 
set L in the line graph as 



$(L) 



K out (L) 



K in (L) + K out (L) 
with the sum of internal degrees 

m 

K U \L) = J2 Vk(L)E km (L) 

k,l=l 

m n B B 



(6) 



(7) 



fc,Z=l 



i=l 



and the sum of external degrees 

tn 

K out {L) = ^ k (L)E kl (l- fM (L)). 



fc,Z=l 



m " R R 



(8) 



fc,Z=l 



5 http : //www . r-project . org 
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cf. Havemann et al. (2012, eqs. 3 and 4). Now we 
use the relations 

m 



fc=i 



and 



£(i- W (i)) J B« = fcr t (c(i)) ) 



which directly follow from the definition of the inci- 
dence matrix B. Thus, we get 



K hl (L) = J2 



and 



K™\L) = Y, 



k™(C)k™\C) 



i=l 



From this we easily derive the sum 

n 

K(L) = K in (L) + K° ut (L) = J2 ^(C) = hn{C) 
and obtain 



g. e. <i. 

A.2 Updating *(C) 

Let tr(C) the sum in the ^-function and node i a 
neighbour of C . For undirected networks the differ- 
ence A+ a(C) = (t(C U i) — cr(C) is given by 



A+a(C) = yA 



2k°^(C)~A l3 (k™(C)f 



fee kj kl 

The denominator in the VP-function, C's total in- 
ternal degree k ln (C) is increased by 2k™ (C) if neigh- 
bouring node i is included into C. The factor 2 has to 
be used because in the total internal degree of C U i 
each link is counted two times (for undirected net- 
works). Note, that including neighbour i does not 
change its internal degree: k ln (C U i) = k\ n (C) (if 
there are no self- links). The sum can be restricted to 
boundary nodes j € /3(C) because A^j — for inner 
members of C . 



If we exclude a boundary node i the numerator a 
in the ^-function is changed by A~a(C) = <r(C\i) — 
a(C) = -A+a(C\i). Because k^(C\i) = kf(C) and 
k° ut (C\i) = fc° ut (C) + Aij, we get 



A-a(C) 



(fc'"(C)) 2 ^ A 2fc?" t (C)+A, 
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